Regularization effect on autoencoder

1. Data prep

2. Model architecture

The following model is to minimize loss function, $L_T$, in terms of loss from regular autoencoder, $L$, plus regularization term, $R$.

Encoding layer:
$h = \alpha_e(W_1 \times\ x + b_1)$, where $\alpha_e(.)$ is activation function ReLU, and the number of hidden units in $h$ is 196. Hence $h$ is a 196x1 vector in 196-dim latent space, $W_1$ is 196x748 weight matrix and $b_1$ is the bias term in the form of 196x1 vector.
Decoding layer:
$x' = \alpha_d(W_2 \times\ h + b_2)$, where $\alpha_d(.)$ is activation function sigmoid, $x'$ is the output of the autoencoder which is optimized to reconstruct back to input $x$.
Loss function:
$L_T = L + R = ||x - x'||^2 + \lambda_a\sum |h_i| + \lambda_k\sum |W_1|$, where $i$ is the number of hidden units, $\lambda_k\ $and $\lambda_a\ $are kernel and activity regularizer respectively, l1 regularization is used.

3. Training

4. Loss plot

MSE loss ($L_T$) against hyperparameter, activity_regularizer ($\lambda_a$), for training set and testing set

5. Sparsity plot

Sparsity of $h$ against hyperparameter, activity_regularizer ($\lambda_a$), for training set and testing set

6. Weight matrix of the encoder

Weight matrix of the encoder, $W_1$, is shown on a grey-scale heatmap. Each of the subplot showing a row frm $W_1$ reshaped to 28x28

7. Original image vs latent space vs reconstructed image

8. Latent space similarity plot

9. Latent space in T-SNE space

10. K-Mean plot